White Wine Quality by Fawza Alzumia

Introduction:

This dataset include a chemechial properties of white wine which influnce its quality. the dataset contains 13 variables with 4898 observation. All variables are numerical values where only quality is discrete and other values are continues.

## [1] 4898   13
## 'data.frame':    4898 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
##  $ volatile.acidity    : num  0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
##  $ citric.acid         : num  0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
##  $ residual.sugar      : num  20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
##  $ chlorides           : num  0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
##  $ free.sulfur.dioxide : num  45 14 30 47 47 30 30 45 14 28 ...
##  $ total.sulfur.dioxide: num  170 132 97 186 186 97 136 170 132 129 ...
##  $ density             : num  1.001 0.994 0.995 0.996 0.996 ...
##  $ pH                  : num  3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
##  $ sulphates           : num  0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
##  $ alcohol             : num  8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
##  $ quality             : int  6 6 6 6 6 6 6 6 6 6 ...
##        X        fixed.acidity    volatile.acidity  citric.acid    
##  Min.   :   1   Min.   : 3.800   Min.   :0.0800   Min.   :0.0000  
##  1st Qu.:1225   1st Qu.: 6.300   1st Qu.:0.2100   1st Qu.:0.2700  
##  Median :2450   Median : 6.800   Median :0.2600   Median :0.3200  
##  Mean   :2450   Mean   : 6.855   Mean   :0.2782   Mean   :0.3342  
##  3rd Qu.:3674   3rd Qu.: 7.300   3rd Qu.:0.3200   3rd Qu.:0.3900  
##  Max.   :4898   Max.   :14.200   Max.   :1.1000   Max.   :1.6600  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.600   Min.   :0.00900   Min.   :  2.00     
##  1st Qu.: 1.700   1st Qu.:0.03600   1st Qu.: 23.00     
##  Median : 5.200   Median :0.04300   Median : 34.00     
##  Mean   : 6.391   Mean   :0.04577   Mean   : 35.31     
##  3rd Qu.: 9.900   3rd Qu.:0.05000   3rd Qu.: 46.00     
##  Max.   :65.800   Max.   :0.34600   Max.   :289.00     
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  9.0        Min.   :0.9871   Min.   :2.720   Min.   :0.2200  
##  1st Qu.:108.0        1st Qu.:0.9917   1st Qu.:3.090   1st Qu.:0.4100  
##  Median :134.0        Median :0.9937   Median :3.180   Median :0.4700  
##  Mean   :138.4        Mean   :0.9940   Mean   :3.188   Mean   :0.4898  
##  3rd Qu.:167.0        3rd Qu.:0.9961   3rd Qu.:3.280   3rd Qu.:0.5500  
##  Max.   :440.0        Max.   :1.0390   Max.   :3.820   Max.   :1.0800  
##     alcohol         quality     
##  Min.   : 8.00   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.40   Median :6.000  
##  Mean   :10.51   Mean   :5.878  
##  3rd Qu.:11.40   3rd Qu.:6.000  
##  Max.   :14.20   Max.   :9.000

Univariate Plots Section

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.800   6.300   6.800   6.855   7.300  14.200

The median of fixed acity is 6.8. the distrbution of fixed acity is slitly right sekwed. there are spme ouliers in the range >11.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0800  0.2100  0.2600  0.2782  0.3200  1.1000

the median value of volatile acidity is 0.26 the distribution right skwed with right tail and one peak. there are outlires when value > 0.9.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2700  0.3200  0.3342  0.3900  1.6600

the distribution of Citric Acid tend to be normal around its main peak but it has long right tail and one outlire when ~ 0.9. the median value is 0.32.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.600   1.700   5.200   6.391   9.900  65.800

The distribution of Residual Sugar is extremly right skewed. the median is 5.2 while the max is 65. there is no outlier.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600

The distribution of Chlorides looks normal around its main peak but has a very long right tail. the median value is 0.43.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   23.00   34.00   35.31   46.00  289.00

The distribution of Free Sulfur Dioxide is right skewed and concentrated around 34 (median). There are a few outliers in the right side of the plot.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     9.0   108.0   134.0   138.4   167.0   440.0

The distribution of Total Sulfur Dioxide is right skewed whith outliers in highr range > 300.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9871  0.9917  0.9937  0.9940  0.9961  1.0390

The distribution of Density is right skewed and concentrated around 0.99 (median). the plot has some outlier at 1.01.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.720   3.090   3.180   3.188   3.280   3.820

The Distribution of pH is unimodel and normal. the median is 3.18 and 1st Qu. is 3.09 and 3rd Qu. is 3.28.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2200  0.4100  0.4700  0.4898  0.5500  1.0800

The distribution of Sulphates is non-symmetric and has bimodal behavior. its slightly right skewed with right tail. the median is 0.47 and mean is 0.48.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.00    9.50   10.40   10.51   11.40   14.20

The distribution of alcohol is right skewed with some ups and downs. it has bimodal behavior, we can see 3 peaks at ~9, ~11 and ~12.5.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.878   6.000   9.000

The values are not continues. the distributaion is normal with one peak in the middle the 1st Qu. is 5 and 3rd Qu. is 6 the distance from min to median is 3 and distance from median to max is 3 too. no outliers.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.878   6.000   9.000

to improve the graph I will make breaks for the x scale and change the bin width.

More than 2300 of white wines have guality grade equal to 6 and around 1500 of white wines have quality grade equal 5 which means that almost 2800 of white wines have a good qaulity rate. Where around 1250 of whietwines have excelent quality rate (bigger tahan 6) and less than 250 of white wine have bad quality rate.Overall white wine have a good quality.

Univariate Analysis

What is the structure of your dataset?

the dataset contains 12 variable and 4898 observation. 11 variables are related to the Chemical composition of white wine and 1 variable was related to the final result of this composition and factoring which is quality of wine.

What is/are the main feature(s) of interest in your dataset?

Quality I want to know which component make raise the quality of white wine.

What other features in the dataset do you think will help support your into your feature(s) of interest?

Alcohol, I think alcohol is the most component contributes on the quality of wine.

Did you create any new variables from existing variables in the dataset?

No, I did not. ### Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

The possible values for quality are from 0-10 where in our dataset of white wine we have values from 3-9 which means either there is no extremly bad white wine or there is no data of bad wihte wine. also it means the white wine tend to be more good than bad since max is 9 and min is 3.

Bivariate Plots Section

##  [1] "X"                    "fixed.acidity"        "volatile.acidity"    
##  [4] "citric.acid"          "residual.sugar"       "chlorides"           
##  [7] "free.sulfur.dioxide"  "total.sulfur.dioxide" "density"             
## [10] "pH"                   "sulphates"            "alcohol"             
## [13] "quality"

Quality Vs Alcohol

its clear that when alcoho increase thequality of wine is increase

Dinsity Vs Quality

from the above graph the lower density means higher quality

Quality Vs Residual Sugar

A very weak relation, higher quality grade lower Residual sugar

Quality Vs Chlorides

From the above graph, lower chlorides means higher quality of wine.

Dinsity Vs Alcohol

A strong Relation between alcohol and density, when the the density decrease alcohol increase.

Density Vs Residual Sugar

there is avery strong relation when the density increase the residual sugar increase.

PH Vs Fixed Acidity

When the Fixed Acidity increase the PH decrease

Bivariate Analysis

Tip: As before, summarize what you found in your bivariate explorations here. Use the questions below to guide your discussion.

Talk about some of the relationships you observed in this part of the


investigation. How did the feature(s) of interest vary with other features in
the dataset?

From the above correlation We can see the most correlation with quality are with Alcohol, density, cholorides, and a week relation with residual sugar. Also there is correlation between PH and fixed acidity and between density and alcohol.

Did you observe any interesting relationships between the other features?
(not the main feature(s) of interest)?

Yes, A positive correlation between density and residual sugar , and negitive correlation between alcohol and density

What was the strongest relationship you found?

the strongest rerationship that i found was between density and residual sugar which had correlation with more than 0.8

Multivariate Plots Section

The quality of white wine is high when the alcohol is high and the dinsity is low. Don’t see much effect for residual sugar.

Don’t see much variation on chlorides.

when the total sulfur dioxide is low and alcohol is high the quality is high.

Too many outlier, no impact for volatile acidity on the quality.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

the most observed relation is between alcoho ana quality. also there is a inverse relation (but not high effected) with quality and density and between quality and total sulfur dioxide.

Were there any interesting or surprising interactions between features?

The intersting part is that there is only one variable shows clear impact on white wine quality.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.


Final Plots and Summary

Plot One

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.878   6.000   9.000

Description One

This is the bar Chart for quality variable, the chart shows that value of quality between 1 to 9. maximum number of white wine have quality grade 6 and the minimum has grade 9.

Plot Two

Description Two

The above plotshows that white wine with high density have a low concentration of alcohol, a negative correlation between density and alcoho.

Plot Three

Description Three

The above graph shows that when the density is decrease and alcohol increase the quality increase. positive corelation between alcohol and quality while a negitave correlation between deinstity and alcohol and density and quality.

Reflection

The most challenges forme was understanding the data since i have no idea no background about wines so this was like a weakness of this project. I was surprised also as i did not find any strong factor for white wine other than alcohol.